[30] P. L. Shaffer, “Minimization of Interprocessor Synchronization in Multiprocessors with Shared and Private Memory, ” International Conference on Parallel Processing

نویسندگان

  • A. Kalavade
  • E. A. Lee
  • J. Van Ginderdeuren
  • G. Liao
  • G. R. Gao
  • V. K. Agarwal
چکیده

The execution time or estimated execution time of actor. UBS Unbounded buffer synchronization. A synchronization protocol that must be used for feedforward edges of the synchronization graph. This protocol requires four synchronization accesses per iteration period. t v () v 68 Glossary Same as with the DFG understood from context. If there is no path in from to , then ; otherwise, , where is any minimum-delay path from to. Given a path , is the sum of the edge delays over all edges in. Represents an edge whose source and sink vertices are and , respectively, and whose delay is equal to. Represents the maximum cycle mean of a DFG. BBS Bounded buffer synchronization. A synchronization protocol that may be used for feedback edges in a synchronization graph. This protocol requires two synchronization accesses per schedule period. critical cycle A fundamental cycle in a DFG whose cycle mean is equal to the maximum cycle mean of the DFG. cycle mean The cycle mean of a cycle in a DFG is equal to , where is the sum of the execution times of all vertices on , and is the sum of delays of all edges in. estimated throughput Given a DFG with execution time estimates for the actors, the estimated throughput is the reciprocal of the maximum cycle mean. feedback edge An edge that is contained in at least one cycle. feedforward edge An edge that is not contained in a cycle. maximum cycle mean Given a DFG, the maximum cycle mean is the largest cycle mean over all fundamental cycles in the DFG. SCC Strongly connected component. self-timed buffer bound Given a feedback edge in a synchronization graph, the self-timed buffer bound is an upper bound on the number of tokens that can simultaneously reside on (the buffer size). synchronization access An access to shared memory that used to update or examine the status of a synchronization variable. synchronization cost The average number of synchronization accesses that must be performed per iteration period in the self timed implementation of a ρ x y , () ρ G G ρ G x y , () G x y ρ G x y , () ∞ = ρ G x y , () p () Delay = p x y p () Delay p p () Delay p d n u v , () u v n λ max C …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experiences with Data Distribution on NUMA Shared Memory Multiprocessors

The choice of a good data distribution scheme is critical to performance of data-parallel applications on both distributed memory multiprocessors and NUMA shared memory multiprocessors. The high cost of interprocessor communication in distributed memory multiprocessors makes the minimization of communications the predominant issue in selecting data distributionschemes. However, on NUMA multipro...

متن کامل

Performance of Parallel Branch and Bound Algorithms on the KSR1 Multiprocessor

In this paper we consider the parallelization of the branch and bound (BB) algorithm with best-rst search strategy on the KSR1 shared-memory mul-tiprocessor. Two shared-memory parallel BB algorithms are implemented on a 56-processor system. Measurements indicate that the scalability of the two algorithms is limited by the cost of interprocessor communications and by the cost of synchronization....

متن کامل

On Automatic Loop Data-Mapping for Distributed-Memory Multiprocessors

In this paper we present a unified approach for compiling programs for Distributed-Memory Multiprocessors (DMM). Parallelization of sequential programs for DMM is much more difficult to achieve than for shared memory systems due to the exclusive local memory of each Virtual Processor (VP). The approach presented distributes computations among VPs of the system and maps data onto their private m...

متن کامل

Automatic Partitioning of Data and Computations on Scalable Shared Memory Multiprocessors

This paper describes an algorithm for deriving data and computation partitions on scalable shared memory multiprocessors. The algorithm establishes affinity relationships between where computations are performed and where data is located based on array accesses in the program. The algorithm then uses these affinity relationships to determine both static and dynamic partitions for arrays and par...

متن کامل

Architectural and Software Support for Executing Numerical Applications on High Performance Computers By

Numerical applications require large amounts of computing power. Although shared memory multiprocessors provide a cost-e ective platform for parallel execution of numerical programs, parallel processing has not delivered the expected performance on these machines. There are two crucial steps in parallel execution of numerical applications: (1) e ective parallelization of an application and (2) ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1968